Identifying Real or Fake Articles: Towards better Language Modeling

نویسندگان

  • Sameer Badaskar
  • Sachin Agarwal
  • Shilpa Arora
چکیده

The problem of identifying good features for improving conventional language models like trigrams is presented as a classification task in this paper. The idea is to use various syntactic and semantic features extracted from a language for classifying between real-world articles and articles generated by sampling a trigram language model. In doing so, a good accuracy obtained on the classification task implies that the extracted features capture those aspects of the language that a trigram model may not. Such features can be used to improve the existing trigram language models. We describe the results of our experiments on the classification task performed on a Broadcast News Corpus and discuss their effects on language modeling in general.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Fake and Real Articles Based on Support Vector Machines

Fake or real? That is the question, even in the context of languages. In this course project, we are given the task of distinguishing real Broadcast News articles from fake “articles” generated by a trigram model trained from the 100 million word corpus of Broadcast News articles from 1992–1996. This task is clearly not difficult for humans, while machines are not as smart as us to tell whether...

متن کامل

Fake Variables in research

In each journal, the editorial board receives many articles but more than 70% of them are rejected. This happens because there is no real correlation among the variables in these articles or the variables and perceived relations are fake, which means playing with the variables nonexistent in reality. This rejection occurs mainly as a result of the researchers' misinterpretation of the interdisc...

متن کامل

Classifying Articles as Fake or Real Language and Statistics Spring 2007

Is it real or fake? That is the question. A discrimination task that may seem trivial to humans can be extremely complicated for a machine. Humans make use of a “makes sense” feature, which relies on world knowledge including Linguistics, to distinguish between real and fake articles. Unfortunately such a feature does not exist for machines yet. As such, to solve a relatively mundane problem fo...

متن کامل

Unsupervised Content-Based Identification of Fake News Articles with Tensor Decomposition Ensembles

Social media provide a platform for quick and seamless access to information. However, the propagation of false information, especially during the last year, raises major concerns, especially given the fact that social media are the primary source of information for a large percentage of the population. False information may manipulate people’s beliefs and have real-life consequences. Œerefore,...

متن کامل

This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News

The problem of fake news has gained a lot of attention as it is claimed to have had a significant impact on 2016 US Presidential Elections. Fake news is not a new problem and its spread in social networks is well-studied. Often an underlying assumption in fake news discussion is that it is written to look like real news, fooling the reader who does not check for reliability of the sources or th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008